Overview

Brought to you by YData

Dataset statistics

Number of variables 23
Number of observations 2326383
Missing cells 0
Missing cells (%) 0.0%
Duplicate rows 0
Duplicate rows (%) 0.0%
Total size in memory 408.2 MiB
Average record size in memory 184.0 B

Variable types

Text 2
Numeric 8
Categorical 12
DateTime 1

Alerts

XA is highly overall correlated with XB and 6 other fields High correlation
XH is highly overall correlated with XA and 2 other fields High correlation
XJ is highly overall correlated with XA and 2 other fields High correlation
XN is highly overall correlated with XA and 5 other fields High correlation
polypharmacy is highly overall correlated with XA and 9 other fields High correlation
XB is highly overall correlated with XA and 1 other fields High correlation
XC is highly overall correlated with XA and 2 other fields High correlation
XD is highly overall correlated with polypharmacy High correlation
XR is highly overall correlated with XA and 1 other fields High correlation
XS is highly overall correlated with XN and 1 other fields High correlation
XV is highly overall correlated with polypharmacy High correlation
XB is highly imbalanced (75.8%) Imbalance
XD is highly imbalanced (68.7%) Imbalance
XG is highly imbalanced (78.0%) Imbalance
XH is highly imbalanced (52.1%) Imbalance
XM is highly imbalanced (72.4%) Imbalance
XP is highly imbalanced (83.9%) Imbalance
XR is highly imbalanced (68.5%) Imbalance
XS is highly imbalanced (92.7%) Imbalance
XV is highly imbalanced (88.7%) Imbalance
XA has 1621810 (69.7%) zeros Zeros
XC has 1931835 (83.0%) zeros Zeros
XJ has 1775841 (76.3%) zeros Zeros
XL has 1426884 (61.3%) zeros Zeros
XN has 1263277 (54.3%) zeros Zeros

Reproduction

Analysis started 2025-04-28 13:35:35.381062
Analysis finished 2025-04-28 13:38:54.475603
Duration 3 minutes and 19.09 seconds
Software version ydata-profiling vv4.16.1
Download configuration config.json

Variables

Distinct 264444
Distinct (%) 11.4%
Missing 0
Missing (%) 0.0%
Memory size 17.7 MiB
2025-04-28T20:38:54.885348 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 9
Median length 8
Mean length 8.1550819
Min length 7

Characters and Unicode

Total characters 18971844
Distinct characters 12
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 44773 ?
Unique (%) 1.9%

Sample

1st row 10003357
2nd row 10003357
3rd row 10003357
4th row 10003357
5th row 10003357
Value Count Frequency (%)
15997294 3696
 
0.2%
15333886 2112
 
0.1%
15759495 2112
 
0.1%
16008869 2058
 
0.1%
15614390 1920
 
0.1%
8439111-2 1700
 
0.1%
14539518 1518
 
0.1%
5528013-7 1271
 
0.1%
10487906 1260
 
0.1%
15954362 1222
 
0.1%
Other values (264434) 2307514
99.2%
2025-04-28T20:38:55.537840 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
1 2990474
15.8%
5 1869073
9.9%
4 1803500
9.5%
3 1693078
8.9%
9 1691331
8.9%
2 1678102
8.8%
0 1664041
8.8%
8 1631391
8.6%
7 1625053
8.6%
6 1601195
8.4%
Other values (2) 724606
 
3.8%

Most occurring categories

Value Count Frequency (%)
(unknown) 18971844
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
1 2990474
15.8%
5 1869073
9.9%
4 1803500
9.5%
3 1693078
8.9%
9 1691331
8.9%
2 1678102
8.8%
0 1664041
8.8%
8 1631391
8.6%
7 1625053
8.6%
6 1601195
8.4%
Other values (2) 724606
 
3.8%

Most occurring scripts

Value Count Frequency (%)
(unknown) 18971844
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
1 2990474
15.8%
5 1869073
9.9%
4 1803500
9.5%
3 1693078
8.9%
9 1691331
8.9%
2 1678102
8.8%
0 1664041
8.8%
8 1631391
8.6%
7 1625053
8.6%
6 1601195
8.4%
Other values (2) 724606
 
3.8%

Most occurring blocks

Value Count Frequency (%)
(unknown) 18971844
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
1 2990474
15.8%
5 1869073
9.9%
4 1803500
9.5%
3 1693078
8.9%
9 1691331
8.9%
2 1678102
8.8%
0 1664041
8.8%
8 1631391
8.6%
7 1625053
8.6%
6 1601195
8.4%
Other values (2) 724606
 
3.8%

ade
Text

Distinct 460831
Distinct (%) 19.8%
Missing 0
Missing (%) 0.0%
Memory size 17.7 MiB
2025-04-28T20:38:56.088904 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Length

Max length 17
Median length 17
Mean length 16.993551
Min length 14

Characters and Unicode

Total characters 39533507
Distinct characters 11
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 238895 ?
Unique (%) 10.3%

Sample

1st row 21602735_36717998
2nd row 21602735_42890355
3rd row 21602735_36718024
4th row 21603927_36717998
5th row 21603927_42890355
Value Count Frequency (%)
21602295_37320158 3841
 
0.2%
21604559_35506601 3791
 
0.2%
21602295_36718382 3338
 
0.1%
21602295_37320109 3302
 
0.1%
21604559_37522220 2618
 
0.1%
21602295_37320257 2115
 
0.1%
21602256_35809005 2059
 
0.1%
21603911_35809005 2015
 
0.1%
21602295_37320170 2000
 
0.1%
21604757_35809304 1898
 
0.1%
Other values (460821) 2299406
98.8%
2025-04-28T20:38:56.738373 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

Value Count Frequency (%)
1 5381572
13.6%
0 5276087
13.3%
2 4853600
12.3%
3 4803584
12.2%
6 4710638
11.9%
5 2789783
7.1%
4 2567553
6.5%
9 2389708
6.0%
_ 2326383
5.9%
7 2227323
5.6%

Most occurring categories

Value Count Frequency (%)
(unknown) 39533507
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
1 5381572
13.6%
0 5276087
13.3%
2 4853600
12.3%
3 4803584
12.2%
6 4710638
11.9%
5 2789783
7.1%
4 2567553
6.5%
9 2389708
6.0%
_ 2326383
5.9%
7 2227323
5.6%

Most occurring scripts

Value Count Frequency (%)
(unknown) 39533507
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
1 5381572
13.6%
0 5276087
13.3%
2 4853600
12.3%
3 4803584
12.2%
6 4710638
11.9%
5 2789783
7.1%
4 2567553
6.5%
9 2389708
6.0%
_ 2326383
5.9%
7 2227323
5.6%

Most occurring blocks

Value Count Frequency (%)
(unknown) 39533507
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
1 5381572
13.6%
0 5276087
13.3%
2 4853600
12.3%
3 4803584
12.2%
6 4710638
11.9%
5 2789783
7.1%
4 2567553
6.5%
9 2389708
6.0%
_ 2326383
5.9%
7 2227323
5.6%

atc_concept_id
Real number (ℝ)

Distinct 1088
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 21914967
Minimum 1588648
Maximum 45893497
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 17.7 MiB
2025-04-28T20:38:56.930962 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 1588648
5-th percentile 21600484
Q1 21602256
median 21603481
Q3 21604348
95-th percentile 21604757
Maximum 45893497
Range 44304849
Interquartile range (IQR) 2092

Descriptive statistics

Standard deviation 2677450.9
Coefficient of variation (CV) 0.12217454
Kurtosis 55.220493
Mean 21914967
Median Absolute Deviation (MAD) 952
Skewness 6.6278526
Sum 5.0982606 × 1013
Variance 7.1687434 × 1012
Monotonicity Not monotonic
2025-04-28T20:38:57.105076 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
21603929 50124
 
2.2%
21601423 46303
 
2.0%
21602256 45484
 
2.0%
21603911 36440
 
1.6%
21604757 34234
 
1.5%
21604559 34152
 
1.5%
21603908 33579
 
1.4%
21604344 32519
 
1.4%
21603967 31713
 
1.4%
21602735 31070
 
1.3%
Other values (1078) 1950765
83.9%
Value Count Frequency (%)
1588648 6
 
< 0.1%
1588697 2422
0.1%
21600005 301
 
< 0.1%
21600008 275
 
< 0.1%
21600012 176
 
< 0.1%
21600013 575
 
< 0.1%
21600019 247
 
< 0.1%
21600034 712
 
< 0.1%
21600056 118
 
< 0.1%
21600082 403
 
< 0.1%
Value Count Frequency (%)
45893497 146
 
< 0.1%
45893489 31
 
< 0.1%
45893488 2044
0.1%
45893476 2
 
< 0.1%
45893474 485
 
< 0.1%
45893464 164
 
< 0.1%
45893463 82
 
< 0.1%
45893461 101
 
< 0.1%
45893458 13
 
< 0.1%
45893267 1
 
< 0.1%

meddra_concept_id
Real number (ℝ)

Distinct 10770
Distinct (%) 0.5%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 36498705
Minimum 788090
Maximum 46277190
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 17.7 MiB
2025-04-28T20:38:57.281414 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 788090
5-th percentile 35204948
Q1 35809005
median 36211473
Q3 36818768
95-th percentile 37622529
Maximum 46277190
Range 45489100
Interquartile range (IQR) 1009763

Descriptive statistics

Standard deviation 2540578.3
Coefficient of variation (CV) 0.069607353
Kurtosis 106.19304
Mean 36498705
Median Absolute Deviation (MAD) 503566
Skewness -6.3448816
Sum 8.4909968 × 1013
Variance 6.4545378 × 1012
Monotonicity Not monotonic
2025-04-28T20:38:57.454990 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
37522220 35311
 
1.5%
35809327 32226
 
1.4%
35809054 30912
 
1.3%
35708208 29732
 
1.3%
35708202 23037
 
1.0%
36718132 21475
 
0.9%
35708093 18309
 
0.8%
35708154 16199
 
0.7%
35809243 15840
 
0.7%
35205038 15392
 
0.7%
Other values (10760) 2087950
89.8%
Value Count Frequency (%)
788090 1
 
< 0.1%
788094 51
< 0.1%
788095 7
 
< 0.1%
788096 3
 
< 0.1%
788098 20
 
< 0.1%
788100 28
< 0.1%
788104 10
 
< 0.1%
788105 11
 
< 0.1%
788115 45
< 0.1%
788120 35
< 0.1%
Value Count Frequency (%)
46277190 3
 
< 0.1%
46277169 5
 
< 0.1%
46277163 5
 
< 0.1%
46276846 6
 
< 0.1%
46276844 22
< 0.1%
46276840 3
 
< 0.1%
46276826 2
 
< 0.1%
46276825 19
< 0.1%
46276824 2
 
< 0.1%
46276815 5
 
< 0.1%

nichd
Categorical

Distinct 7
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 17.7 MiB
early_adolescence
927023 
late_adolescence
475711 
middle_childhood
454547 
early_childhood
193382 
infancy
120377 
Other values (2)
155343 

Length

Max length 17
Median length 16
Mean length 15.406697
Min length 7

Characters and Unicode

Total characters 35841879
Distinct characters 16
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row middle_childhood
2nd row middle_childhood
3rd row middle_childhood
4th row middle_childhood
5th row middle_childhood

Common Values

Value Count Frequency (%)
early_adolescence 927023
39.8%
late_adolescence 475711
20.4%
middle_childhood 454547
19.5%
early_childhood 193382
 
8.3%
infancy 120377
 
5.2%
toddler 94078
 
4.0%
term_neonatal 61265
 
2.6%

Length

2025-04-28T20:38:57.615689 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T20:38:57.750274 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
early_adolescence 927023
39.8%
late_adolescence 475711
20.4%
middle_childhood 454547
19.5%
early_childhood 193382
 
8.3%
infancy 120377
 
5.2%
toddler 94078
 
4.0%
term_neonatal 61265
 
2.6%

Most occurring characters

Value Count Frequency (%)
e 6475473
18.1%
l 4256669
11.9%
d 3795842
10.6%
c 3573774
10.0%
a 3241757
9.0%
o 2853935
8.0%
_ 2111928
 
5.9%
n 1766018
 
4.9%
s 1402734
 
3.9%
h 1295858
 
3.6%
Other values (6) 5067891
14.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 35841879
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
e 6475473
18.1%
l 4256669
11.9%
d 3795842
10.6%
c 3573774
10.0%
a 3241757
9.0%
o 2853935
8.0%
_ 2111928
 
5.9%
n 1766018
 
4.9%
s 1402734
 
3.9%
h 1295858
 
3.6%
Other values (6) 5067891
14.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 35841879
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
e 6475473
18.1%
l 4256669
11.9%
d 3795842
10.6%
c 3573774
10.0%
a 3241757
9.0%
o 2853935
8.0%
_ 2111928
 
5.9%
n 1766018
 
4.9%
s 1402734
 
3.9%
h 1295858
 
3.6%
Other values (6) 5067891
14.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 35841879
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
e 6475473
18.1%
l 4256669
11.9%
d 3795842
10.6%
c 3573774
10.0%
a 3241757
9.0%
o 2853935
8.0%
_ 2111928
 
5.9%
n 1766018
 
4.9%
s 1402734
 
3.9%
h 1295858
 
3.6%
Other values (6) 5067891
14.1%

sex
Categorical

Distinct 2
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 17.7 MiB
Female
1235999 
Male
1090384 

Length

Max length 6
Median length 6
Mean length 5.0625929
Min length 4

Characters and Unicode

Total characters 11777530
Distinct characters 6
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Male
2nd row Male
3rd row Male
4th row Male
5th row Male

Common Values

Value Count Frequency (%)
Female 1235999
53.1%
Male 1090384
46.9%

Length

2025-04-28T20:38:57.935324 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T20:38:58.063552 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
female 1235999
53.1%
male 1090384
46.9%

Most occurring characters

Value Count Frequency (%)
e 3562382
30.2%
a 2326383
19.8%
l 2326383
19.8%
F 1235999
 
10.5%
m 1235999
 
10.5%
M 1090384
 
9.3%

Most occurring categories

Value Count Frequency (%)
(unknown) 11777530
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
e 3562382
30.2%
a 2326383
19.8%
l 2326383
19.8%
F 1235999
 
10.5%
m 1235999
 
10.5%
M 1090384
 
9.3%

Most occurring scripts

Value Count Frequency (%)
(unknown) 11777530
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
e 3562382
30.2%
a 2326383
19.8%
l 2326383
19.8%
F 1235999
 
10.5%
m 1235999
 
10.5%
M 1090384
 
9.3%

Most occurring blocks

Value Count Frequency (%)
(unknown) 11777530
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
e 3562382
30.2%
a 2326383
19.8%
l 2326383
19.8%
F 1235999
 
10.5%
m 1235999
 
10.5%
M 1090384
 
9.3%
Distinct 5
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 17.7 MiB
Physician
782542 
Other health professional
716161 
Consumer or non-health professional
685768 
Pharmacist
104806 
Lawyer
 
37106

Length

Max length 35
Median length 25
Mean length 21.586935
Min length 6

Characters and Unicode

Total characters 50219479
Distinct characters 23
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row Other health professional
2nd row Other health professional
3rd row Other health professional
4th row Other health professional
5th row Other health professional

Common Values

Value Count Frequency (%)
Physician 782542
33.6%
Other health professional 716161
30.8%
Consumer or non-health professional 685768
29.5%
Pharmacist 104806
 
4.5%
Lawyer 37106
 
1.6%

Length

2025-04-28T20:38:58.198359 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T20:38:58.340865 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
professional 1401929
24.1%
physician 782542
13.5%
other 716161
12.3%
health 716161
12.3%
consumer 685768
11.8%
or 685768
11.8%
non-health 685768
11.8%
pharmacist 104806
 
1.8%
lawyer 37106
 
0.6%

Most occurring characters

Value Count Frequency (%)
o 4861162
9.7%
h 4407367
 
8.8%
s 4376974
 
8.7%
e 4242893
 
8.4%
n 4241775
 
8.4%
a 3833118
 
7.6%
r 3631538
 
7.2%
3489626
 
6.9%
i 3071819
 
6.1%
l 2803858
 
5.6%
Other values (13) 11259349
22.4%

Most occurring categories

Value Count Frequency (%)
(unknown) 50219479
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
o 4861162
9.7%
h 4407367
 
8.8%
s 4376974
 
8.7%
e 4242893
 
8.4%
n 4241775
 
8.4%
a 3833118
 
7.6%
r 3631538
 
7.2%
3489626
 
6.9%
i 3071819
 
6.1%
l 2803858
 
5.6%
Other values (13) 11259349
22.4%

Most occurring scripts

Value Count Frequency (%)
(unknown) 50219479
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
o 4861162
9.7%
h 4407367
 
8.8%
s 4376974
 
8.7%
e 4242893
 
8.4%
n 4241775
 
8.4%
a 3833118
 
7.6%
r 3631538
 
7.2%
3489626
 
6.9%
i 3071819
 
6.1%
l 2803858
 
5.6%
Other values (13) 11259349
22.4%

Most occurring blocks

Value Count Frequency (%)
(unknown) 50219479
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
o 4861162
9.7%
h 4407367
 
8.8%
s 4376974
 
8.7%
e 4242893
 
8.4%
n 4241775
 
8.4%
a 3833118
 
7.6%
r 3631538
 
7.2%
3489626
 
6.9%
i 3071819
 
6.1%
l 2803858
 
5.6%
Other values (13) 11259349
22.4%
Distinct 4770
Distinct (%) 0.2%
Missing 0
Missing (%) 0.0%
Memory size 17.7 MiB
Minimum 1994-02-02 00:00:00
Maximum 2019-03-31 00:00:00
Invalid dates 0
Invalid dates (%) 0.0%
2025-04-28T20:38:58.505591 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:58.691932 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

XA
Real number (ℝ)

High correlation  Zeros 

Distinct 12
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 0.65569642
Minimum 0
Maximum 11
Zeros 1621810
Zeros (%) 69.7%
Negative 0
Negative (%) 0.0%
Memory size 17.7 MiB
2025-04-28T20:38:58.848818 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
median 0
Q3 1
95-th percentile 3
Maximum 11
Range 11
Interquartile range (IQR) 1

Descriptive statistics

Standard deviation 1.3875397
Coefficient of variation (CV) 2.1161313
Kurtosis 13.272512
Mean 0.65569642
Median Absolute Deviation (MAD) 0
Skewness 3.2488564
Sum 1525401
Variance 1.9252664
Monotonicity Not monotonic
2025-04-28T20:38:59.333269 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
Value Count Frequency (%)
0 1621810
69.7%
1 353477
 
15.2%
2 162982
 
7.0%
3 76611
 
3.3%
4 47470
 
2.0%
5 21878
 
0.9%
6 14480
 
0.6%
7 11448
 
0.5%
10 6596
 
0.3%
9 5156
 
0.2%
Other values (2) 4475
 
0.2%
Value Count Frequency (%)
0 1621810
69.7%
1 353477
 
15.2%
2 162982
 
7.0%
3 76611
 
3.3%
4 47470
 
2.0%
5 21878
 
0.9%
6 14480
 
0.6%
7 11448
 
0.5%
8 3916
 
0.2%
9 5156
 
0.2%
Value Count Frequency (%)
11 559
 
< 0.1%
10 6596
 
0.3%
9 5156
 
0.2%
8 3916
 
0.2%
7 11448
 
0.5%
6 14480
 
0.6%
5 21878
 
0.9%
4 47470
 
2.0%
3 76611
3.3%
2 162982
7.0%

XB
Categorical

High correlation  Imbalance 

Distinct 6
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 17.7 MiB
0.0
2057225 
1.0
212455 
2.0
 
44962
4.0
 
7694
3.0
 
4012

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 6979149
Distinct characters 7
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 2057225
88.4%
1.0 212455
 
9.1%
2.0 44962
 
1.9%
4.0 7694
 
0.3%
3.0 4012
 
0.2%
5.0 35
 
< 0.1%

Length

2025-04-28T20:38:59.468130 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T20:38:59.592218 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
0.0 2057225
88.4%
1.0 212455
 
9.1%
2.0 44962
 
1.9%
4.0 7694
 
0.3%
3.0 4012
 
0.2%
5.0 35
 
< 0.1%

Most occurring characters

Value Count Frequency (%)
0 4383608
62.8%
. 2326383
33.3%
1 212455
 
3.0%
2 44962
 
0.6%
4 7694
 
0.1%
3 4012
 
0.1%
5 35
 
< 0.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 4383608
62.8%
. 2326383
33.3%
1 212455
 
3.0%
2 44962
 
0.6%
4 7694
 
0.1%
3 4012
 
0.1%
5 35
 
< 0.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 4383608
62.8%
. 2326383
33.3%
1 212455
 
3.0%
2 44962
 
0.6%
4 7694
 
0.1%
3 4012
 
0.1%
5 35
 
< 0.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 4383608
62.8%
. 2326383
33.3%
1 212455
 
3.0%
2 44962
 
0.6%
4 7694
 
0.1%
3 4012
 
0.1%
5 35
 
< 0.1%

XC
Real number (ℝ)

High correlation  Zeros 

Distinct 11
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 0.31425522
Minimum 0
Maximum 10
Zeros 1931835
Zeros (%) 83.0%
Negative 0
Negative (%) 0.0%
Memory size 17.7 MiB
2025-04-28T20:38:59.721644 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
median 0
Q3 0
95-th percentile 2
Maximum 10
Range 10
Interquartile range (IQR) 0

Descriptive statistics

Standard deviation 0.88681187
Coefficient of variation (CV) 2.821948
Kurtosis 24.397496
Mean 0.31425522
Median Absolute Deviation (MAD) 0
Skewness 4.2289174
Sum 731078
Variance 0.7864353
Monotonicity Not monotonic
2025-04-28T20:38:59.850841 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
Value Count Frequency (%)
0 1931835
83.0%
1 221139
 
9.5%
2 92689
 
4.0%
3 37441
 
1.6%
4 25176
 
1.1%
5 7811
 
0.3%
6 5485
 
0.2%
7 2455
 
0.1%
10 1700
 
0.1%
8 484
 
< 0.1%
Value Count Frequency (%)
0 1931835
83.0%
1 221139
 
9.5%
2 92689
 
4.0%
3 37441
 
1.6%
4 25176
 
1.1%
5 7811
 
0.3%
6 5485
 
0.2%
7 2455
 
0.1%
8 484
 
< 0.1%
9 168
 
< 0.1%
Value Count Frequency (%)
10 1700
 
0.1%
9 168
 
< 0.1%
8 484
 
< 0.1%
7 2455
 
0.1%
6 5485
 
0.2%
5 7811
 
0.3%
4 25176
 
1.1%
3 37441
 
1.6%
2 92689
4.0%
1 221139
9.5%

XD
Categorical

High correlation  Imbalance 

Distinct 7
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 17.7 MiB
0.0
1851235 
1.0
410125 
2.0
 
48958
3.0
 
12473
4.0
 
2176
Other values (2)
 
1416

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 6979149
Distinct characters 8
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 1851235
79.6%
1.0 410125
 
17.6%
2.0 48958
 
2.1%
3.0 12473
 
0.5%
4.0 2176
 
0.1%
5.0 1251
 
0.1%
6.0 165
 
< 0.1%

Length

2025-04-28T20:38:59.984641 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T20:39:00.113474 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
0.0 1851235
79.6%
1.0 410125
 
17.6%
2.0 48958
 
2.1%
3.0 12473
 
0.5%
4.0 2176
 
0.1%
5.0 1251
 
0.1%
6.0 165
 
< 0.1%

Most occurring characters

Value Count Frequency (%)
0 4177618
59.9%
. 2326383
33.3%
1 410125
 
5.9%
2 48958
 
0.7%
3 12473
 
0.2%
4 2176
 
< 0.1%
5 1251
 
< 0.1%
6 165
 
< 0.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 4177618
59.9%
. 2326383
33.3%
1 410125
 
5.9%
2 48958
 
0.7%
3 12473
 
0.2%
4 2176
 
< 0.1%
5 1251
 
< 0.1%
6 165
 
< 0.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 4177618
59.9%
. 2326383
33.3%
1 410125
 
5.9%
2 48958
 
0.7%
3 12473
 
0.2%
4 2176
 
< 0.1%
5 1251
 
< 0.1%
6 165
 
< 0.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 4177618
59.9%
. 2326383
33.3%
1 410125
 
5.9%
2 48958
 
0.7%
3 12473
 
0.2%
4 2176
 
< 0.1%
5 1251
 
< 0.1%
6 165
 
< 0.1%

XG
Categorical

Imbalance 

Distinct 5
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 17.7 MiB
0.0
2093389 
1.0
215437 
2.0
 
16416
3.0
 
641
5.0
 
500

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 6979149
Distinct characters 6
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 2093389
90.0%
1.0 215437
 
9.3%
2.0 16416
 
0.7%
3.0 641
 
< 0.1%
5.0 500
 
< 0.1%

Length

2025-04-28T20:39:00.256709 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T20:39:00.376710 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
0.0 2093389
90.0%
1.0 215437
 
9.3%
2.0 16416
 
0.7%
3.0 641
 
< 0.1%
5.0 500
 
< 0.1%

Most occurring characters

Value Count Frequency (%)
0 4419772
63.3%
. 2326383
33.3%
1 215437
 
3.1%
2 16416
 
0.2%
3 641
 
< 0.1%
5 500
 
< 0.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 4419772
63.3%
. 2326383
33.3%
1 215437
 
3.1%
2 16416
 
0.2%
3 641
 
< 0.1%
5 500
 
< 0.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 4419772
63.3%
. 2326383
33.3%
1 215437
 
3.1%
2 16416
 
0.2%
3 641
 
< 0.1%
5 500
 
< 0.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 4419772
63.3%
. 2326383
33.3%
1 215437
 
3.1%
2 16416
 
0.2%
3 641
 
< 0.1%
5 500
 
< 0.1%

XH
Categorical

High correlation  Imbalance 

Distinct 5
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 17.7 MiB
0.0
1708771 
1.0
459754 
2.0
 
124040
3.0
 
27853
4.0
 
5965

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 6979149
Distinct characters 6
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 1.0
2nd row 1.0
3rd row 1.0
4th row 1.0
5th row 1.0

Common Values

Value Count Frequency (%)
0.0 1708771
73.5%
1.0 459754
 
19.8%
2.0 124040
 
5.3%
3.0 27853
 
1.2%
4.0 5965
 
0.3%

Length

2025-04-28T20:39:00.509008 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T20:39:00.628578 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
0.0 1708771
73.5%
1.0 459754
 
19.8%
2.0 124040
 
5.3%
3.0 27853
 
1.2%
4.0 5965
 
0.3%

Most occurring characters

Value Count Frequency (%)
0 4035154
57.8%
. 2326383
33.3%
1 459754
 
6.6%
2 124040
 
1.8%
3 27853
 
0.4%
4 5965
 
0.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 4035154
57.8%
. 2326383
33.3%
1 459754
 
6.6%
2 124040
 
1.8%
3 27853
 
0.4%
4 5965
 
0.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 4035154
57.8%
. 2326383
33.3%
1 459754
 
6.6%
2 124040
 
1.8%
3 27853
 
0.4%
4 5965
 
0.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 4035154
57.8%
. 2326383
33.3%
1 459754
 
6.6%
2 124040
 
1.8%
3 27853
 
0.4%
4 5965
 
0.1%

XJ
Real number (ℝ)

High correlation  Zeros 

Distinct 14
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 0.4788846
Minimum 0
Maximum 13
Zeros 1775841
Zeros (%) 76.3%
Negative 0
Negative (%) 0.0%
Memory size 17.7 MiB
2025-04-28T20:39:00.752975 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
median 0
Q3 0
95-th percentile 3
Maximum 13
Range 13
Interquartile range (IQR) 0

Descriptive statistics

Standard deviation 1.1490155
Coefficient of variation (CV) 2.3993577
Kurtosis 17.782648
Mean 0.4788846
Median Absolute Deviation (MAD) 0
Skewness 3.6609975
Sum 1114069
Variance 1.3202365
Monotonicity Not monotonic
2025-04-28T20:39:00.894297 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
Value Count Frequency (%)
0 1775841
76.3%
1 290231
 
12.5%
2 125571
 
5.4%
3 62237
 
2.7%
4 31745
 
1.4%
5 14973
 
0.6%
7 10493
 
0.5%
6 10021
 
0.4%
8 1678
 
0.1%
11 1448
 
0.1%
Other values (4) 2145
 
0.1%
Value Count Frequency (%)
0 1775841
76.3%
1 290231
 
12.5%
2 125571
 
5.4%
3 62237
 
2.7%
4 31745
 
1.4%
5 14973
 
0.6%
6 10021
 
0.4%
7 10493
 
0.5%
8 1678
 
0.1%
9 1001
 
< 0.1%
Value Count Frequency (%)
13 78
 
< 0.1%
12 264
 
< 0.1%
11 1448
 
0.1%
10 802
 
< 0.1%
9 1001
 
< 0.1%
8 1678
 
0.1%
7 10493
 
0.5%
6 10021
 
0.4%
5 14973
0.6%
4 31745
1.4%

XL
Real number (ℝ)

Zeros 

Distinct 13
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 0.94838726
Minimum 0
Maximum 12
Zeros 1426884
Zeros (%) 61.3%
Negative 0
Negative (%) 0.0%
Memory size 17.7 MiB
2025-04-28T20:39:01.022222 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
median 0
Q3 1
95-th percentile 5
Maximum 12
Range 12
Interquartile range (IQR) 1

Descriptive statistics

Standard deviation 1.6191105
Coefficient of variation (CV) 1.707225
Kurtosis 4.8838488
Mean 0.94838726
Median Absolute Deviation (MAD) 0
Skewness 2.1625039
Sum 2206312
Variance 2.6215187
Monotonicity Not monotonic
2025-04-28T20:39:01.164941 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
Value Count Frequency (%)
0 1426884
61.3%
1 362830
 
15.6%
2 214569
 
9.2%
3 122483
 
5.3%
4 79939
 
3.4%
5 51489
 
2.2%
7 27435
 
1.2%
6 27415
 
1.2%
8 9447
 
0.4%
9 2107
 
0.1%
Other values (3) 1785
 
0.1%
Value Count Frequency (%)
0 1426884
61.3%
1 362830
 
15.6%
2 214569
 
9.2%
3 122483
 
5.3%
4 79939
 
3.4%
5 51489
 
2.2%
6 27415
 
1.2%
7 27435
 
1.2%
8 9447
 
0.4%
9 2107
 
0.1%
Value Count Frequency (%)
12 95
 
< 0.1%
11 580
 
< 0.1%
10 1110
 
< 0.1%
9 2107
 
0.1%
8 9447
 
0.4%
7 27435
 
1.2%
6 27415
 
1.2%
5 51489
2.2%
4 79939
3.4%
3 122483
5.3%

XM
Categorical

Imbalance 

Distinct 6
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 17.7 MiB
0.0
1984186 
1.0
289738 
2.0
 
41954
3.0
 
9321
4.0
 
750

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 6979149
Distinct characters 7
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 1984186
85.3%
1.0 289738
 
12.5%
2.0 41954
 
1.8%
3.0 9321
 
0.4%
4.0 750
 
< 0.1%
5.0 434
 
< 0.1%

Length

2025-04-28T20:39:01.308403 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T20:39:01.432760 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
0.0 1984186
85.3%
1.0 289738
 
12.5%
2.0 41954
 
1.8%
3.0 9321
 
0.4%
4.0 750
 
< 0.1%
5.0 434
 
< 0.1%

Most occurring characters

Value Count Frequency (%)
0 4310569
61.8%
. 2326383
33.3%
1 289738
 
4.2%
2 41954
 
0.6%
3 9321
 
0.1%
4 750
 
< 0.1%
5 434
 
< 0.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 4310569
61.8%
. 2326383
33.3%
1 289738
 
4.2%
2 41954
 
0.6%
3 9321
 
0.1%
4 750
 
< 0.1%
5 434
 
< 0.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 4310569
61.8%
. 2326383
33.3%
1 289738
 
4.2%
2 41954
 
0.6%
3 9321
 
0.1%
4 750
 
< 0.1%
5 434
 
< 0.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 4310569
61.8%
. 2326383
33.3%
1 289738
 
4.2%
2 41954
 
0.6%
3 9321
 
0.1%
4 750
 
< 0.1%
5 434
 
< 0.1%

XN
Real number (ℝ)

High correlation  Zeros 

Distinct 19
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 1.183239
Minimum 0
Maximum 18
Zeros 1263277
Zeros (%) 54.3%
Negative 0
Negative (%) 0.0%
Memory size 17.7 MiB
2025-04-28T20:39:01.561361 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
median 0
Q3 2
95-th percentile 5
Maximum 18
Range 18
Interquartile range (IQR) 2

Descriptive statistics

Standard deviation 1.9163878
Coefficient of variation (CV) 1.6196118
Kurtosis 10.943332
Mean 1.183239
Median Absolute Deviation (MAD) 0
Skewness 2.7071917
Sum 2752667
Variance 3.6725422
Monotonicity Not monotonic
2025-04-28T20:39:01.708868 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
Value Count Frequency (%)
0 1263277
54.3%
1 396920
 
17.1%
2 278106
 
12.0%
3 155760
 
6.7%
4 89910
 
3.9%
5 54744
 
2.4%
6 29638
 
1.3%
8 15884
 
0.7%
7 15769
 
0.7%
9 14466
 
0.6%
Other values (9) 11909
 
0.5%
Value Count Frequency (%)
0 1263277
54.3%
1 396920
 
17.1%
2 278106
 
12.0%
3 155760
 
6.7%
4 89910
 
3.9%
5 54744
 
2.4%
6 29638
 
1.3%
7 15769
 
0.7%
8 15884
 
0.7%
9 14466
 
0.6%
Value Count Frequency (%)
18 230
 
< 0.1%
17 2154
 
0.1%
16 182
 
< 0.1%
15 543
 
< 0.1%
14 679
 
< 0.1%
13 2091
 
0.1%
12 1167
 
0.1%
11 2280
 
0.1%
10 2583
 
0.1%
9 14466
0.6%

XP
Categorical

Imbalance 

Distinct 3
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 17.7 MiB
0.0
2229563 
1.0
 
94902
2.0
 
1918

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 6979149
Distinct characters 4
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 2229563
95.8%
1.0 94902
 
4.1%
2.0 1918
 
0.1%

Length

2025-04-28T20:39:01.858066 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T20:39:01.969711 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
0.0 2229563
95.8%
1.0 94902
 
4.1%
2.0 1918
 
0.1%

Most occurring characters

Value Count Frequency (%)
0 4555946
65.3%
. 2326383
33.3%
1 94902
 
1.4%
2 1918
 
< 0.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 4555946
65.3%
. 2326383
33.3%
1 94902
 
1.4%
2 1918
 
< 0.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 4555946
65.3%
. 2326383
33.3%
1 94902
 
1.4%
2 1918
 
< 0.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 4555946
65.3%
. 2326383
33.3%
1 94902
 
1.4%
2 1918
 
< 0.1%

XR
Categorical

High correlation  Imbalance 

Distinct 10
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 17.7 MiB
0.0
1854131 
1.0
286792 
2.0
 
114389
3.0
 
39061
4.0
 
23978
Other values (5)
 
8032

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 6979149
Distinct characters 11
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 1854131
79.7%
1.0 286792
 
12.3%
2.0 114389
 
4.9%
3.0 39061
 
1.7%
4.0 23978
 
1.0%
5.0 6004
 
0.3%
6.0 1329
 
0.1%
7.0 406
 
< 0.1%
9.0 199
 
< 0.1%
8.0 94
 
< 0.1%

Length

2025-04-28T20:39:02.090362 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T20:39:02.230353 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
0.0 1854131
79.7%
1.0 286792
 
12.3%
2.0 114389
 
4.9%
3.0 39061
 
1.7%
4.0 23978
 
1.0%
5.0 6004
 
0.3%
6.0 1329
 
0.1%
7.0 406
 
< 0.1%
9.0 199
 
< 0.1%
8.0 94
 
< 0.1%

Most occurring characters

Value Count Frequency (%)
0 4180514
59.9%
. 2326383
33.3%
1 286792
 
4.1%
2 114389
 
1.6%
3 39061
 
0.6%
4 23978
 
0.3%
5 6004
 
0.1%
6 1329
 
< 0.1%
7 406
 
< 0.1%
9 199
 
< 0.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 4180514
59.9%
. 2326383
33.3%
1 286792
 
4.1%
2 114389
 
1.6%
3 39061
 
0.6%
4 23978
 
0.3%
5 6004
 
0.1%
6 1329
 
< 0.1%
7 406
 
< 0.1%
9 199
 
< 0.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 4180514
59.9%
. 2326383
33.3%
1 286792
 
4.1%
2 114389
 
1.6%
3 39061
 
0.6%
4 23978
 
0.3%
5 6004
 
0.1%
6 1329
 
< 0.1%
7 406
 
< 0.1%
9 199
 
< 0.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 4180514
59.9%
. 2326383
33.3%
1 286792
 
4.1%
2 114389
 
1.6%
3 39061
 
0.6%
4 23978
 
0.3%
5 6004
 
0.1%
6 1329
 
< 0.1%
7 406
 
< 0.1%
9 199
 
< 0.1%

XS
Categorical

High correlation  Imbalance 

Distinct 6
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 17.7 MiB
0.0
2265241 
1.0
 
56154
2.0
 
4098
3.0
 
554
4.0
 
210

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 6979149
Distinct characters 7
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 2265241
97.4%
1.0 56154
 
2.4%
2.0 4098
 
0.2%
3.0 554
 
< 0.1%
4.0 210
 
< 0.1%
5.0 126
 
< 0.1%

Length

2025-04-28T20:39:02.386977 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T20:39:02.509009 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
0.0 2265241
97.4%
1.0 56154
 
2.4%
2.0 4098
 
0.2%
3.0 554
 
< 0.1%
4.0 210
 
< 0.1%
5.0 126
 
< 0.1%

Most occurring characters

Value Count Frequency (%)
0 4591624
65.8%
. 2326383
33.3%
1 56154
 
0.8%
2 4098
 
0.1%
3 554
 
< 0.1%
4 210
 
< 0.1%
5 126
 
< 0.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 4591624
65.8%
. 2326383
33.3%
1 56154
 
0.8%
2 4098
 
0.1%
3 554
 
< 0.1%
4 210
 
< 0.1%
5 126
 
< 0.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 4591624
65.8%
. 2326383
33.3%
1 56154
 
0.8%
2 4098
 
0.1%
3 554
 
< 0.1%
4 210
 
< 0.1%
5 126
 
< 0.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 4591624
65.8%
. 2326383
33.3%
1 56154
 
0.8%
2 4098
 
0.1%
3 554
 
< 0.1%
4 210
 
< 0.1%
5 126
 
< 0.1%

XV
Categorical

High correlation  Imbalance 

Distinct 4
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Memory size 17.7 MiB
0.0
2250558 
1.0
 
67586
2.0
 
7430
3.0
 
809

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 6979149
Distinct characters 5
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 2250558
96.7%
1.0 67586
 
2.9%
2.0 7430
 
0.3%
3.0 809
 
< 0.1%

Length

2025-04-28T20:39:02.643104 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-04-28T20:39:02.759321 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Value Count Frequency (%)
0.0 2250558
96.7%
1.0 67586
 
2.9%
2.0 7430
 
0.3%
3.0 809
 
< 0.1%

Most occurring characters

Value Count Frequency (%)
0 4576941
65.6%
. 2326383
33.3%
1 67586
 
1.0%
2 7430
 
0.1%
3 809
 
< 0.1%

Most occurring categories

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per category

(unknown)
Value Count Frequency (%)
0 4576941
65.6%
. 2326383
33.3%
1 67586
 
1.0%
2 7430
 
0.1%
3 809
 
< 0.1%

Most occurring scripts

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per script

(unknown)
Value Count Frequency (%)
0 4576941
65.6%
. 2326383
33.3%
1 67586
 
1.0%
2 7430
 
0.1%
3 809
 
< 0.1%

Most occurring blocks

Value Count Frequency (%)
(unknown) 6979149
100.0%

Most frequent character per block

(unknown)
Value Count Frequency (%)
0 4576941
65.6%
. 2326383
33.3%
1 67586
 
1.0%
2 7430
 
0.1%
3 809
 
< 0.1%

polypharmacy
Real number (ℝ)

High correlation 

Distinct 49
Distinct (%) < 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 5.6192979
Minimum 1
Maximum 64
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 17.7 MiB
2025-04-28T20:39:02.894930 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum 1
5-th percentile 1
Q1 2
median 4
Q3 7
95-th percentile 16
Maximum 64
Range 63
Interquartile range (IQR) 5

Descriptive statistics

Standard deviation 6.1094532
Coefficient of variation (CV) 1.0872272
Kurtosis 18.941564
Mean 5.6192979
Median Absolute Deviation (MAD) 2
Skewness 3.5236648
Sum 13072639
Variance 37.325419
Monotonicity Not monotonic
2025-04-28T20:39:03.067439 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=49)
Value Count Frequency (%)
1 411627
17.7%
2 365244
15.7%
3 301086
12.9%
4 248700
10.7%
5 195340
8.4%
6 160134
 
6.9%
7 120414
 
5.2%
8 98368
 
4.2%
9 72585
 
3.1%
10 57060
 
2.5%
Other values (39) 295825
12.7%
Value Count Frequency (%)
1 411627
17.7%
2 365244
15.7%
3 301086
12.9%
4 248700
10.7%
5 195340
8.4%
6 160134
 
6.9%
7 120414
 
5.2%
8 98368
 
4.2%
9 72585
 
3.1%
10 57060
 
2.5%
Value Count Frequency (%)
64 2112
 
0.1%
53 106
 
< 0.1%
51 663
 
< 0.1%
48 432
 
< 0.1%
47 1034
 
< 0.1%
46 1012
 
< 0.1%
45 135
 
< 0.1%
44 6424
0.3%
43 86
 
< 0.1%
42 3066
0.1%

Interactions

2025-04-28T20:38:37.024463 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:14.611272 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:17.780838 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:20.994523 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:24.094070 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:27.256381 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:30.458383 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:33.684948 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:37.476555 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:15.046443 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:18.158535 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:21.355703 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:24.457646 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:27.650442 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:30.850414 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:34.100630 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:37.861628 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:15.412701 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:18.583284 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:21.693200 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:24.861217 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:28.060592 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:31.261621 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:34.526194 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:38.261969 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:15.775215 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:18.956937 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:22.148786 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:25.248296 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:28.475296 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:31.684257 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:34.952800 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:38.643913 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:16.136402 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:19.356563 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:22.541551 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:25.655628 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:28.822735 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:32.085017 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:35.360342 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:39.068303 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:16.501724 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:19.735273 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:22.940638 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:26.070460 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:29.260163 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:32.463903 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:35.793001 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:39.447090 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:16.898312 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:20.146260 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:23.333844 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:26.467813 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:29.663709 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:32.879500 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:36.160242 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:39.838745 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:17.350085 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:20.593018 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:23.720505 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:26.883490 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:30.079659 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:33.300422 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
2025-04-28T20:38:36.575547 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/

Correlations

2025-04-28T20:39:03.200199 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
atc_concept_id meddra_concept_id XA XB XC XD XG XH XJ XL XM XN XP XR XS XV polypharmacy
atc_concept_id 1.000 0.001 -0.031 0.009 -0.007 -0.040 -0.006 -0.037 -0.039 -0.020 -0.032 -0.013 -0.015 -0.029 -0.009 -0.016 -0.048
meddra_concept_id 0.001 1.000 -0.021 -0.017 -0.021 -0.019 -0.008 -0.032 -0.020 -0.041 -0.008 -0.006 -0.015 0.002 -0.009 -0.019 -0.036
XA -0.031 -0.021 1.000 0.394 0.310 0.377 0.068 0.318 0.340 0.133 0.322 0.412 0.210 0.399 0.162 0.309 0.767
XB 0.009 -0.017 0.394 1.000 0.383 0.175 0.143 0.190 0.176 0.062 0.209 0.222 0.117 0.197 0.079 0.219 0.481
XC -0.007 -0.021 0.310 0.383 1.000 0.129 0.180 0.177 0.213 -0.024 0.139 0.228 0.130 0.171 0.096 0.183 0.476
XD -0.040 -0.019 0.377 0.175 0.129 1.000 -0.010 0.310 0.259 0.239 0.165 0.128 0.305 0.125 0.122 0.159 0.489
XG -0.006 -0.008 0.068 0.143 0.180 -0.010 1.000 -0.053 0.014 -0.136 0.093 0.007 0.017 0.059 0.008 -0.035 0.082
XH -0.037 -0.032 0.318 0.190 0.177 0.310 -0.053 1.000 0.225 0.376 0.138 0.110 0.129 0.117 0.108 0.237 0.526
XJ -0.039 -0.020 0.340 0.176 0.213 0.259 0.014 0.225 1.000 0.093 0.165 0.194 0.222 0.148 0.141 0.214 0.561
XL -0.020 -0.041 0.133 0.062 -0.024 0.239 -0.136 0.376 0.093 1.000 0.007 -0.111 0.097 -0.088 0.047 0.214 0.394
XM -0.032 -0.008 0.322 0.209 0.139 0.165 0.093 0.138 0.165 0.007 1.000 0.254 0.089 0.192 0.212 0.110 0.410
XN -0.013 -0.006 0.412 0.222 0.228 0.128 0.007 0.110 0.194 -0.111 0.254 1.000 0.141 0.198 0.153 0.174 0.577
XP -0.015 -0.015 0.210 0.117 0.130 0.305 0.017 0.129 0.222 0.097 0.089 0.141 1.000 0.062 0.075 0.115 0.309
XR -0.029 0.002 0.399 0.197 0.171 0.125 0.059 0.117 0.148 -0.088 0.192 0.198 0.062 1.000 0.060 0.087 0.402
XS -0.009 -0.009 0.162 0.079 0.096 0.122 0.008 0.108 0.141 0.047 0.212 0.153 0.075 0.060 1.000 0.087 0.253
XV -0.016 -0.019 0.309 0.219 0.183 0.159 -0.035 0.237 0.214 0.214 0.110 0.174 0.115 0.087 0.087 1.000 0.406
polypharmacy -0.048 -0.036 0.767 0.481 0.476 0.489 0.082 0.526 0.561 0.394 0.410 0.577 0.309 0.402 0.253 0.406 1.000
2025-04-28T20:39:03.426280 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
atc_concept_id meddra_concept_id XA XB XC XD XG XH XJ XL XM XN XP XR XS XV polypharmacy
atc_concept_id 1.000 0.050 -0.271 -0.171 -0.204 -0.168 -0.150 -0.177 -0.145 -0.182 0.035 0.393 -0.003 -0.081 0.002 -0.028 -0.168
meddra_concept_id 0.050 1.000 -0.045 -0.025 -0.021 -0.016 -0.006 -0.058 -0.036 -0.109 -0.009 0.055 -0.019 0.005 -0.012 -0.022 -0.069
XA -0.271 -0.045 1.000 0.250 0.273 0.239 0.077 0.214 0.271 0.097 0.247 0.220 0.170 0.351 0.101 0.177 0.540
XB -0.171 -0.025 0.250 1.000 0.356 0.133 0.150 0.114 0.130 0.079 0.153 0.088 0.104 0.136 0.075 0.163 0.310
XC -0.204 -0.021 0.273 0.356 1.000 0.084 0.196 0.133 0.166 -0.026 0.112 0.150 0.118 0.188 0.043 0.169 0.356
XD -0.168 -0.016 0.239 0.133 0.084 1.000 -0.041 0.243 0.179 0.284 0.119 -0.004 0.256 0.104 0.095 0.133 0.320
XG -0.150 -0.006 0.077 0.150 0.196 -0.041 1.000 -0.067 0.008 -0.182 0.071 -0.013 0.004 0.069 0.012 -0.040 0.073
XH -0.177 -0.058 0.214 0.114 0.133 0.243 -0.067 1.000 0.183 0.377 0.107 -0.034 0.108 0.092 0.066 0.192 0.454
XJ -0.145 -0.036 0.271 0.130 0.166 0.179 0.008 0.183 1.000 0.067 0.152 0.055 0.204 0.153 0.076 0.162 0.406
XL -0.182 -0.109 0.097 0.079 -0.026 0.284 -0.182 0.377 0.067 1.000 -0.007 -0.261 0.099 -0.106 0.025 0.210 0.403
XM 0.035 -0.009 0.247 0.153 0.112 0.119 0.071 0.107 0.152 -0.007 1.000 0.191 0.089 0.180 0.200 0.103 0.308
XN 0.393 0.055 0.220 0.088 0.150 -0.004 -0.013 -0.034 0.055 -0.261 0.191 1.000 0.082 0.172 0.086 0.080 0.321
XP -0.003 -0.019 0.170 0.104 0.118 0.256 0.004 0.108 0.204 0.099 0.089 0.082 1.000 0.090 0.062 0.116 0.213
XR -0.081 0.005 0.351 0.136 0.188 0.104 0.069 0.092 0.153 -0.106 0.180 0.172 0.090 1.000 0.057 0.081 0.327
XS 0.002 -0.012 0.101 0.075 0.043 0.095 0.012 0.066 0.076 0.025 0.200 0.086 0.062 0.057 1.000 0.051 0.141
XV -0.028 -0.022 0.177 0.163 0.169 0.133 -0.040 0.192 0.162 0.210 0.103 0.080 0.116 0.081 0.051 1.000 0.223
polypharmacy -0.168 -0.069 0.540 0.310 0.356 0.320 0.073 0.454 0.406 0.403 0.308 0.321 0.213 0.327 0.141 0.223 1.000
2025-04-28T20:39:03.651156 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
atc_concept_id meddra_concept_id XA XB XC XD XG XH XJ XL XM XN XP XR XS XV polypharmacy
atc_concept_id 1.000 0.034 -0.212 -0.139 -0.163 -0.137 -0.123 -0.142 -0.113 -0.142 0.028 0.293 -0.002 -0.065 0.002 -0.023 -0.120
meddra_concept_id 0.034 1.000 -0.035 -0.020 -0.017 -0.013 -0.005 -0.046 -0.028 -0.083 -0.007 0.041 -0.015 0.004 -0.010 -0.018 -0.048
XA -0.212 -0.035 1.000 0.236 0.253 0.225 0.073 0.200 0.249 0.086 0.233 0.194 0.161 0.326 0.096 0.168 0.456
XB -0.139 -0.020 0.236 1.000 0.344 0.130 0.148 0.111 0.124 0.073 0.150 0.080 0.103 0.131 0.074 0.161 0.264
XC -0.163 -0.017 0.253 0.344 1.000 0.081 0.191 0.127 0.156 -0.023 0.108 0.135 0.115 0.178 0.042 0.164 0.299
XD -0.137 -0.013 0.225 0.130 0.081 1.000 -0.041 0.234 0.171 0.263 0.117 -0.005 0.253 0.100 0.094 0.131 0.274
XG -0.123 -0.005 0.073 0.148 0.191 -0.041 1.000 -0.065 0.007 -0.169 0.070 -0.012 0.004 0.067 0.012 -0.039 0.063
XH -0.142 -0.046 0.200 0.111 0.127 0.234 -0.065 1.000 0.172 0.345 0.104 -0.031 0.105 0.087 0.064 0.187 0.384
XJ -0.113 -0.028 0.249 0.124 0.156 0.171 0.007 0.172 1.000 0.060 0.145 0.048 0.197 0.143 0.073 0.155 0.339
XL -0.142 -0.083 0.086 0.073 -0.023 0.263 -0.169 0.345 0.060 1.000 -0.006 -0.225 0.092 -0.096 0.024 0.195 0.328
XM 0.028 -0.007 0.233 0.150 0.108 0.117 0.070 0.104 0.145 -0.006 1.000 0.174 0.088 0.173 0.198 0.102 0.262
XN 0.293 0.041 0.194 0.080 0.135 -0.005 -0.012 -0.031 0.048 -0.225 0.174 1.000 0.075 0.155 0.079 0.073 0.257
XP -0.002 -0.015 0.161 0.103 0.115 0.253 0.004 0.105 0.197 0.092 0.088 0.075 1.000 0.087 0.062 0.116 0.182
XR -0.065 0.004 0.326 0.131 0.178 0.100 0.067 0.087 0.143 -0.096 0.173 0.155 0.087 1.000 0.055 0.078 0.273
XS 0.002 -0.010 0.096 0.074 0.042 0.094 0.012 0.064 0.073 0.024 0.198 0.079 0.062 0.055 1.000 0.051 0.120
XV -0.023 -0.018 0.168 0.161 0.164 0.131 -0.039 0.187 0.155 0.195 0.102 0.073 0.116 0.078 0.051 1.000 0.191
polypharmacy -0.120 -0.048 0.456 0.264 0.299 0.274 0.063 0.384 0.339 0.328 0.262 0.257 0.182 0.273 0.120 0.191 1.000
2025-04-28T20:39:04.043390 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
atc_concept_id meddra_concept_id nichd sex reporter_qualification XA XB XC XD XG XH XJ XL XM XN XP XR XS XV polypharmacy
atc_concept_id 1.000 0.030 0.049 0.031 0.044 0.038 0.032 0.026 0.043 0.021 0.033 0.036 0.040 0.036 0.034 0.014 0.064 0.013 0.028 0.039
meddra_concept_id 0.030 1.000 0.066 0.013 0.032 0.024 0.027 0.034 0.048 0.020 0.019 0.021 0.038 0.025 0.047 0.010 0.029 0.012 0.032 0.032
nichd 0.049 0.066 1.000 0.145 0.109 0.089 0.068 0.134 0.140 0.124 0.092 0.096 0.149 0.052 0.117 0.047 0.071 0.035 0.068 0.126
sex 0.031 0.013 0.145 1.000 0.046 0.085 0.077 0.081 0.054 0.154 0.037 0.069 0.083 0.088 0.075 0.016 0.042 0.043 0.061 0.090
reporter_qualification 0.044 0.032 0.109 0.046 1.000 0.156 0.081 0.192 0.103 0.216 0.242 0.153 0.270 0.075 0.137 0.056 0.083 0.039 0.053 0.201
XA 0.038 0.024 0.089 0.085 0.156 1.000 0.615 0.540 0.465 0.129 0.631 0.514 0.309 0.382 0.699 0.292 0.558 0.398 0.449 0.859
XB 0.032 0.027 0.068 0.077 0.081 0.615 1.000 0.410 0.166 0.123 0.325 0.258 0.182 0.437 0.375 0.257 0.362 0.220 0.259 0.600
XC 0.026 0.034 0.134 0.081 0.192 0.540 0.410 1.000 0.212 0.218 0.471 0.456 0.158 0.235 0.590 0.193 0.258 0.366 0.251 0.692
XD 0.043 0.048 0.140 0.054 0.103 0.465 0.166 0.212 1.000 0.100 0.259 0.349 0.219 0.168 0.250 0.360 0.186 0.157 0.250 0.635
XG 0.021 0.020 0.124 0.154 0.216 0.129 0.123 0.218 0.100 1.000 0.101 0.154 0.169 0.165 0.222 0.063 0.107 0.017 0.029 0.245
XH 0.033 0.019 0.092 0.037 0.242 0.631 0.325 0.471 0.259 0.101 1.000 0.419 0.476 0.163 0.619 0.146 0.292 0.310 0.193 0.758
XJ 0.036 0.021 0.096 0.069 0.153 0.514 0.258 0.456 0.349 0.154 0.419 1.000 0.285 0.192 0.521 0.297 0.243 0.342 0.338 0.731
XL 0.040 0.038 0.149 0.083 0.270 0.309 0.182 0.158 0.219 0.169 0.476 0.285 1.000 0.092 0.278 0.150 0.150 0.113 0.239 0.490
XM 0.036 0.025 0.052 0.088 0.075 0.382 0.437 0.235 0.168 0.165 0.163 0.192 0.092 1.000 0.285 0.168 0.234 0.313 0.125 0.378
XN 0.034 0.047 0.117 0.075 0.137 0.699 0.375 0.590 0.250 0.222 0.619 0.521 0.278 0.285 1.000 0.234 0.327 0.520 0.274 0.833
XP 0.014 0.010 0.047 0.016 0.056 0.292 0.257 0.193 0.360 0.063 0.146 0.297 0.150 0.168 0.234 1.000 0.124 0.186 0.097 0.396
XR 0.064 0.029 0.071 0.042 0.083 0.558 0.362 0.258 0.186 0.107 0.292 0.243 0.150 0.234 0.327 0.124 1.000 0.111 0.155 0.507
XS 0.013 0.012 0.035 0.043 0.039 0.398 0.220 0.366 0.157 0.017 0.310 0.342 0.113 0.313 0.520 0.186 0.111 1.000 0.270 0.572
XV 0.028 0.032 0.068 0.061 0.053 0.449 0.259 0.251 0.250 0.029 0.193 0.338 0.239 0.125 0.274 0.097 0.155 0.270 1.000 0.608
polypharmacy 0.039 0.032 0.126 0.090 0.201 0.859 0.600 0.692 0.635 0.245 0.758 0.731 0.490 0.378 0.833 0.396 0.507 0.572 0.608 1.000
2025-04-28T20:39:04.271443 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
XB XD XG XH XM XP XR XS XV nichd reporter_qualification sex
XB 1.000 0.099 0.083 0.227 0.170 0.110 0.200 0.081 0.169 0.040 0.055 0.055
XD 0.099 1.000 0.064 0.169 0.101 0.259 0.095 0.094 0.174 0.049 0.066 0.058
XG 0.083 0.064 1.000 0.038 0.112 0.047 0.044 0.011 0.023 0.079 0.082 0.188
XH 0.227 0.169 0.038 1.000 0.111 0.110 0.125 0.216 0.159 0.058 0.092 0.045
XM 0.170 0.101 0.112 0.111 1.000 0.070 0.125 0.118 0.081 0.031 0.050 0.063
XP 0.110 0.259 0.047 0.110 0.070 1.000 0.074 0.078 0.092 0.031 0.042 0.027
XR 0.200 0.095 0.044 0.125 0.125 0.074 1.000 0.059 0.093 0.036 0.034 0.032
XS 0.081 0.094 0.011 0.216 0.118 0.078 0.059 1.000 0.177 0.021 0.026 0.031
XV 0.169 0.174 0.023 0.159 0.081 0.092 0.093 0.177 1.000 0.046 0.043 0.040
nichd 0.040 0.049 0.079 0.058 0.031 0.031 0.036 0.021 0.046 1.000 0.069 0.155
reporter_qualification 0.055 0.066 0.082 0.092 0.050 0.042 0.034 0.026 0.043 0.069 1.000 0.056
sex 0.055 0.058 0.188 0.045 0.063 0.027 0.032 0.031 0.040 0.155 0.056 1.000

Missing values

2025-04-28T20:38:40.629834 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
A simple visualization of nullity by column.
2025-04-28T20:38:44.510646 image/svg+xml Matplotlib v3.9.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

safetyreportid ade atc_concept_id meddra_concept_id nichd sex reporter_qualification receive_date XA XB XC XD XG XH XJ XL XM XN XP XR XS XV polypharmacy
0 10003357 21602735_36717998 21602735 36717998 middle_childhood Male Other health professional 2014-03-12 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 2
1 10003357 21602735_42890355 21602735 42890355 middle_childhood Male Other health professional 2014-03-12 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 2
2 10003357 21602735_36718024 21602735 36718024 middle_childhood Male Other health professional 2014-03-12 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 2
3 10003357 21603927_36717998 21603927 36717998 middle_childhood Male Other health professional 2014-03-12 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 2
4 10003357 21603927_42890355 21603927 42890355 middle_childhood Male Other health professional 2014-03-12 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 2
5 10003357 21603927_36718024 21603927 36718024 middle_childhood Male Other health professional 2014-03-12 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 2
6 10003388 21600449_35205038 21600449 35205038 late_adolescence Female Consumer or non-health professional 2014-03-12 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1
7 10003388 21600449_35205040 21600449 35205040 late_adolescence Female Consumer or non-health professional 2014-03-12 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1
8 10003388 21600449_35809225 21600449 35809225 late_adolescence Female Consumer or non-health professional 2014-03-12 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1
9 10003401 21600449_36110708 21600449 36110708 early_adolescence Female Consumer or non-health professional 2014-03-12 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1
safetyreportid ade atc_concept_id meddra_concept_id nichd sex reporter_qualification receive_date XA XB XC XD XG XH XJ XL XM XN XP XR XS XV polypharmacy
2326373 9999671 21603448_35104834 21603448 35104834 middle_childhood Male Physician 2014-03-12 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 4
2326374 9999671 21603497_35104834 21603497 35104834 middle_childhood Male Physician 2014-03-12 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.0 0.0 0.0 4
2326375 9999797 21601003_36111047 21601003 36111047 early_childhood Male Other health professional 2014-03-12 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3
2326376 9999797 21600448_36111047 21600448 36111047 early_childhood Male Other health professional 2014-03-12 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3
2326377 9999797 21602653_36111047 21602653 36111047 early_childhood Male Other health professional 2014-03-12 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3
2326378 9999818 40252350_45885779 40252350 45885779 early_adolescence Male Other health professional 2014-03-12 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1
2326379 9999818 40252350_37522220 40252350 37522220 early_adolescence Male Other health professional 2014-03-12 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1
2326380 9999818 40252350_35104691 40252350 35104691 early_adolescence Male Other health professional 2014-03-12 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1
2326381 9999935 21601003_36718554 21601003 36718554 early_adolescence Female Consumer or non-health professional 2014-03-12 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2
2326382 9999935 21600449_36718554 21600449 36718554 early_adolescence Female Consumer or non-health professional 2014-03-12 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2